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Abstract — We present a new information-theoretic defi- 
nition and associated results, based on list decoding in a 
source coding setting. We begin by presenting list-source 
codes, which naturally map a key length (entropy) to 
list size. We then show that such codes can be analyzed 
in the context of a novel information-theoretic metric, e- 
symbol secrecy, that encompasses both the one-time pad 
and traditional rate-based asymptotic metrics, but, like 
most cryptographic constructs, can be applied in non- 
asymptotic settings. We derive fundamental bounds for e- 
symbol secrecy and demonstrate how these bounds can be 
achieved with MDS codes when the source is uniformly 
distributed. We discuss applications and implementation 
issues of our codes. 

I. Introduction 

Classic information-theoretic approaches to secrecy 
are concerned with unconditionally secure systems, i.e. 
schemes that manage to hide all the bits of a message 
from an adversary with unbounded computational re- 
sources. It is well known that, for a noiseless setting, 
unconditional (i.e. perfect) secrecy can only be attained 
when both communicating parties share a random key 
with entropy at least as large as the message itself IT]. In 
other cases, perfect secrecy can sometimes be achieved 
by exploiting particular characteristics of the considered 
model, such as when the legitimate communicating party 
has a less noisy channel than the eavesdropper (wiretap 
channel) ||2l- 

Alternatively, computationally secure cryptosystems 
have thrived both from a theoretical and a practical 
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perspective. Such systems are based on yet unproven 
hardness assumptions, but nevertheless have led to cryp- 
tographic schemes that are widely adopted (for an 
overview, see ID). Currently, computationally secure 
schemes are used millions of times per day, in ap- 
plications that range from online banking transactions 
to digital rights management. However, with the ever 
increasing amount of data streaming over the Internet 
and the need to provide secure connections to mobile 
low powered devices, there is still a constant demand 
for new and efficient security solutions. 

There has been a long exploration of the connection 
between coding and cryptography [|4l|, and our work is 
inscribed in this school of thought. From a theoretical 
perspective, we aim to present a new framework that 
allows the application of information theoretic-tools to 
analyze a broader set of secrecy schemes that go beyond 
the one-time pad and the wiretap model with its varia- 
tions. Towards this goal, we define a new metric for an- 
alyzing security, namely e-symbol secrecy, which quan- 
tifies the uncertainty of specific source symbols given 
an encrypted source sequence. This metric subsumes 
traditional rate-based information-theoretic measures of 
secrecy which, unlike usual cryptographic approaches, 
are generally asymptotic. However, our definition is not 
asymptotic and, indeed, we provide a construction that 
achieves fundamental symbol secrecy bounds, based on 
MDS codes, for finite-length sequence. 

In order to construct schemes that achieve symbol 
secrecy performance bounds, we present the definition 
of list-source codes, which are codes that compress a 
source sequence below its entropy rate. Consequently, 
a list-source code is decoded to a list of possible 
source sequences instead of a unique source sequence. 
Fundamental bounds for list-source codes are derived, 
and explicit constructions that achieve such bounds are 



presented using tools from algebraic coding theory. 

We show how list-source codes can be used as an 
important tool for hiding information with key sizes that 
are only a fraction of the entropy of the message. Using 
list-source codes, it becomes possible to argue that the 
best an adversary can do is to reduce the set of possible 
messages to an exponentially sized list with certain prop- 
erties, where the size of the list depends on the length of 
the key. Since the list has an exponential size, it cannot 
be resolved in polynomial time, offering a certain level of 
computational security. We will show how this property 
can be used to develop hybrid encryption schemes, where 
only part of the message needs to be securely encrypted. 

Our main practical application of interest is secure 
content caching and distribution. We propose a hybrid 
encryption scheme based on list-source codes, where 
a large fraction of the message can be encoded and 
distributed using a key-independent list-source code. The 
information necessary to resolve the decoding list, which 
can be much smaller than the whole message, is then 
encrypted using a secure method. This scheme allows 
a significant amount of content to be distributed and 
cached before dealing with key generation, distribution 
and management issues. 

A. Related work 

Tools from algebraic coding theory have been widely 
used for constructing secrecy schemes 141 . In addition, 
the notion of providing security by exploiting the fact 
that the adversary has incomplete access to informa- 
tion is also central to several secure network coding 
schemes and wiretap models. Ozarow and Wyner ||5| 
introduced the wiretap channel II, where an adversary 
can observe a set k of his choice out of n trans- 
mitted symbols, and proved that there exists a code 
that achieves perfect secrecy. A generalized version of 
this model was investigated by Cai and Yeung in lO, 
where they introduce the related problem of designing 
an information-theoretic ally secure linear network code 
when an adversary can observe a certain number of edges 
in the network. Their results were later extended in Q- 
ifTOl . A more practical approach was presented by Lima 
et al. in ifTTl . For a survey on the theory of secure 
network coding, we refer the reader to lfT2l . 

The setting considered in this paper is related to 
the wiretap channel II in that a fraction of the source 
symbols is hidden from a possible adversary. Oliveira et 
al. investigated in lITSi a related setting in the context of 
data storage over untrusted networks that do not collude, 
introducing a solution based on Vandermonde matrices. 
The MDS coding scheme introduced in this paper is 



similar to ifTsl . albeit the framework developed here is 
more general. 

List decoding techniques for channel coding were 
first introduced by EUas lfT4l and Wozencraft ifTSl , with 
subsequent work by Shannon et al. ifTSl . ifTTl and Forney 
IJ8J . Recently, new algorithmic results for list decoding 
of channel codes were discovered by Gurusuwami and 
Sudan |[T9l . We refer the reader to 1201 for an excellent 
survey of list decoding results. List decoding has been 
considered in the context of source coding in lISTI . The 
approach is related to the one presented here, since we 
may view a secret key as side information, but II2TI do 
not consider source coding and list decoding together for 
the purposes of security. 

B. Communication and threat model 

A transmitter (Alice) sends to a legitimate receiver 
(Bob) a sequence of length n produced by a discrete 
source X with output alphabet X and probability dis- 
tribution px{ )- Both Alice and Bob have access to a 
shared secret key K drawn uniformly and at random 
from a discrete alphabet /C, such that H{K) < 
and encryption/decryption functions Enc ; X" x JC ^ C 
and Dec : C x /C — > A'", where C is the set of 
possible encrypted messages. In addition, Alice commu- 
nicates with Bob over a noiseless channel. Alice observes 
the source sequence X", and transmits an encrypted 
message C ~ Enc{X", K). Bob then recovers X" 
by decrypting the message using the key, recovering 
X" = Dec{C, K). The communication is successful if 
X" = X". 

We assume a passive but computationally unbounded 
eavesdropper (Eve) that has access to all transmitted 
messages from Alice to Bob and knows the functions 
Enc(-) and Dec(-), but does not know the secret key K. 
Eve's goal is to gain as much knowledge as possible 
about the original source sequence. This is the tradi- 
tional framework used in cryptography, and no degraded 
assumption is made beyond the shared secret key. 

In the remainder of this paper we investigate two main 
aspects of this model, described below. 

1 ) Encryption with key entropy smaller than the mes- 
sage entropy: We initially analyze how to perform 
encryption when the key is smaller than the message. 
Towards this goal, we present the definition of list- 
source codes (LSCs), together with fundamental bounds, 
in section Furthermore, practical code constructions 
of LSCs are introduced in section |III] We present list- 
source codes as codes that compress the source sequence 
below its entropy rate, and in section |lll] describe how 
LSCs can be used in the considered model. 
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2) Security analysis and new security metrics for i.i.d. 
sources: We analyze the security of schemes based 
on LSCs in section |IV] In addition, we introduce a 
new information-theoretic metric that can be used in 
scenarios where perfect secrecy cannot be achieved, 
namely absolute and e-symbol secrecy. 

In section |V] we discuss the extension of LSCs to 
Markovian source models, and in section |VT] we present 
applications and practical considerations of the proposed 
secrecy scheme. Finally, section IVIII presents our con- 
cluding remarks. 

II. List decoding and source coding: 
Fundamental Limits 

In this section we present the definition of list-source 
codes and derive fundamental bounds. Consider a dis- 
crete memoryless source X with output alphabet X and 
probability distribution px{')- 

Definition 1. A (2"^, jA"!"^, ri)-list-source code for a 
discrete memoryless source X consists of an encoding 
function /„ ; X" — !> {!,..., 2"^} and a list-decoding 
function g„ : {1, . . . , 2"-"} -> V{X'')\0, where V{X'') 
is the power set of A"" and \g{w)\~ jA"!"^ \/w G 
{!,..., 2"«}. 

Note that < L < L From an operational point 
of view, L is a parameter that determines the size of 
the decoded list. For example, L = corresponds 
to traditional lossless compression, i.e., each source 
sequence is decoded to a unique sequence. Furthermore, 
L = 1 represents the trivial case when the decoded list 
corresponds to 

For a list-source code, an error is declared when a 
string generated by a source is not contained in the 
corresponding decoded list. The average error probability 
is given by 



eL(/„,5n) = Pr(X"^.g„(/„(X"))). 



(1) 



Definition 2. For a given discrete memoryless source 
X, the rate list size pair (i?, L) is said to be achievable 
if for every 5>0, 0<e<l and sufficiently large n 
there exists a sequence of (2"^", jA:"!"^" , n)-list-source 
codes (/n,5n) such that i?„ < R + 5, \Ln — L\< 5 and 
eL„(/n,5n) < £■ The rate list region is the closure of 
all rate list pairs (i?, L). 

Definition 3. The rate list function R{L) is the infimum 
of all rates R such that {R, L) is in the rate list region 
for a given normalized list size < L < 1. 

Proposition 1. For any discrete memoryless source X, 
the rate list function is bounded below by 




Fig. 1. Rate list region for normalized list size L and code rate R. 



Proof: Let S > he given and {fn,9n) be a 
sequence of codes with (normalized) list size L„ such 
that L„ — !• L and for any < e < 1 and n sufficiently 
large < eL(/„,g„) < e. Then 



Pr 



^"e U 9n{w) 



weW" 



>Pr[X" Gg„(/„(X"))] (3) 



> 1 -e 



(4) 



where W" = {1, . . . , 2"^"} and i?„ is the rate of the 
code [frnQn)- Using 122] Lemma 2.14]: 

-log( E l5«HI 1 =-log(2"«"|<Yr") 
= Rn + Ln\og\X\ 



U 9n{w] 



> - log 

n 

> H{X) - 5 



(5) 



R{L) > H{X) -L\og\X\ 



(2) 



if n > no(J, e, lA:"!). Since this holds for any (5 > 0, 
it follows that R{L) > H{X) - LloglA"] for all n 
sufficiently large. 

■ 

Remark 1. Achievability of the bound © will be shown 
through an explicit design using linear codes in the next 
section, so the inequality can be proved to be an equality. 

III. Code design 

A. Trivial approach 

Assume that the source X is uniformly distributed 
in ¥q, i.e., Pr{X = x) = 1/q V.t £ F^. In this case 
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R{L) = (1 — i) \ogq. A trivial scheme for achieving the 
Hst-source boundary is the following. Consider a source 
sequence X" = {XP,X''), where X'p denotes the first 
p = n — [Ln\ symbols of X" and X'^ denotes the last 
s = lLn\ symbols. Encoding is done by discarding X'^, 
and mapping the prefix X^ to a binary codeword y"^ 
of length nR ~ [77, — [Ln\ \ogq] bits. 

For decoding, the codeword y"^ is mapped to X^, 
and the scheme outputs a list of size q" composed 
by XP concatenated with all possible combinations of 
suffixes of length s. Clearly, for n sufficiently large, 
i? w (1 — -^) logg, and we achieve the optimal list-source 
size tradeoff. 

The previous scheme is completely inadequate for 
security purposes. An adversary that observes the binary 
codeword y"^ can uniquely identify the first p symbols 
of the source message, and the uncertainty is concen- 
trated over the last s symbols. Ideally, assuming that 
all source symbols are of equal importance, we should 
spread the uncertainty over all symbols of the message. 
More precisely, given the encoding /(X"), a "good" se- 
curity scheme would provide I{Xi; f{X^)) < e <C logq 
for 1 < i < n. Of course, we can naturally extend this 
notion for groups of symbols or functions over input 
symbol^ This idea will be captured in the definition of 
symbol secrecy, introduced in section |IV] 

B. A construction based on linear codes 

Let X be an i.i.d. source with X E X with 
entropy H{X), and 5„ a source code with encoder 
s„ : F™" and decoder r„ : F™" ^ X'\ 

Furthermore, let C be a (to„, fc„, d) linear code over F^ 
with an (m„ — fc„) x m„ parity check matrix H„ (i.e. 
c £ C H„c = 0). Consider the following scheme, 
where fc„ = nLn\og\X\/\ogq for < L„ < 1 and 
L„ — !> L as n — > cxD. To simplify notation, we assume 
without loss of generality that fc„ is an integer. 

Scheme 1. Encoding: Let X" be the sequence gener- 
ated by the source. Compute the syndrome 5'"'"^'"'" = 
H„s„(X") and map each syndrome to a distinct se- 
quence of nR — [(m„ — fc„)logg] bits, denoted by 

Decoding: Map the binary codeword y"^ to the 
corresponding syndrome 5'™""'^'". Output r„(a;™") for 
each a;™" in the coset of H„ corresponding to 5'™""'^'". 

Proposition 2. IfSn is asymptotically optimal for source 
X, i.e. nin/n — > H{X)/\ogq, scheme \l\ achieves the 
optimal list-source tradeoff point R{L) for an i.i.d. 
source, where R{-) is the rate list function. 

'This idea is tightly related to the concept of hard core predicates 
and semantic security in cryptography. 



Proof: Since the size of each coset corresponding 
to a syndrome a 5'™"^'^" is exactly g'"'", the normalized 
list size is i„ = (knlogq) /{nlog\X\) L. Denoting 
m„/n = H{X)/\ogq+Sn, where (5„ 0, it follows that 
is i? = [(m„ - fc„) loggl/n = \{H{X) + 6n \ogq)n - 
i„nlog|A'|]/n, which is arbitrarily close to the rate in 
(|2]i for sufficiently large n. ■ 

The source coding scheme used in the proof of Propo- 
sition[2]can be any asymptotically optimal scheme. Note 
that if the source X is uniform, and assuming without 
loss of generality that Ln = L and that Ln is an integer, 
any message in the coset of C determined by S*'^^^'" is 
equally likely. Hence, H{X"\S^^-^')") = q^". Scheme 
[T] provides a systematic way of hiding information, and 
we can take advantage of the properties of the underlying 
linear code to make precise assertions regarding the 
"information leakage" of the scheme. 

With the syndrome in hand, how can we recover the 
rest of the message? One possible approach is to find 
a k X n matrix D that has full rank such that the rows 
of D and H form a basis of F^\ Such a matrix can 
be easily found, for example, using the Gram-Schmidt 
process with the rows of H as a starting point. Then we 
simply calculate T^" = DX" and forwai'd T^" to the 
receiver. The receiver can then invert the system 

and recover the original sequence X". This property 
allows list-source codes to be deployed in practice using 
well known linear code constructions, such as Reed- 
Solomon or LDPC. 

Remark 2. This approach is valid for general linear 
spaces, and holds for any pair of full rank matrices H and 
D with dimensions (n — fc) x n and k x n, respectively, 
such that rank([H^D-^]^) = n. However, here we adopt 
the nomenclature of Unear codes since we make use of 
known code constructions to design secrecy schemes in 
the following sections. 

C. A secure communication scheme based on list-source 
codes 

In this section we present a general description of a 
two-phase secure communication scheme for the model 
introduced in section II-BI presented in terms of the list- 
source code constructions derived using linear codes. 
Note that this scheme can be easily extended to any 
list-source code by using the corresponding encod- 
ing/decoding functions instead of multiplication by par- 
ity check matrices. 

We assume that Alice and Bob have access to a 
encryption/decryption scheme (Enc', Dec') that is used 
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with the shared secret key K and is sufficiently secure 
against the adversary. This scheme can be, for example, 
a one-time pad. The encryption/decryption procedure is 
performed as follows, and will be used as components 
of the overall encryption scheme (Enc, Dec) described 
below. 

Scheme 2. Input: The source encoded sequence X" e 
FJ^\ parity check matrix H of a linear code in a full- 
rank fc X n matrix D such that rank([H-^ D-^]) = n, and 
encryption/decryption functions (Enc', Dec'). 
Encryption (Enc): 

Phase I (pre-caching): Alice generates S'"^'^ = HX" 
and sends to Bob. 

Phase II (send encrypted data): Alice generates = 
Enc'(DX'^,K) and sends to Bob. 
Decryption (Dec): Bob calculate DX" = Dec'(-B'') and 
recover X" from S"~'^ and DX". 

Assuming that (Enc', Dec') is secure, the security of 
scheme |2] reduces to the security of the underlying list- 
source code (i.e. scheme [Til. In practice, the encryp- 
tion/decryption functions (Enc', Dec') may depend on 
a secret or public/private key, as long as it provide 
sufficient security for the desired application. In addition, 
assuming that the source sequence is uniform and i.i.d. 
in F^, we can use MDS codes to make strong security 
guarantees, as described in the next section. In this 
case, an adversary that observes 5'""'"" cannot infer any 
information about any set of k symbols of the original 
message. 

Note that this scheme has a tunable level of secrecy: 
The amount of data sent in phase I and phase II can 
be appropriately selected to match the properties of the 
encryption scheme available, the size of the key length, 
and the desired level of secrecy. Furthermore, when the 
encryption procedure has a higher computational cost 
than the list-source encoding/decoding operations, list- 
source codes can be used to reduce the total number of 
operations required by allowing encryption of a smaller 
portion of the message (phase II). 

IV. New metrics for security analysis 

We introduce a new information-theoretic metric for 
security called e-symbol secrecy. This metric can be 
used to characterize the properties of security schemes 
that do not provide absolute secrecy (such as in scheme 
|2]i. Given a source sequence X" and its corresponding 
encryption Y, e-symbol secrecy is the largest fraction 
t/n such that at most e bits can be inferred from any 
t-symbol subsequence of X". We derive a fundamental 
bound for e-symbol secrecy, and show that it can be 
achieved using MDS codes for e = and uniform i.i.d. 



sources. Before presenting the definition, we make a 
few comments on notation and briefly review the threat 
model. 

A. Notation 

Let Cn be a sequence of codes for a discrete memo- 
ryless source X with probability distribution p{x) that 
achieves a rate list pair (_R, L). Furthermore, let F"^- 
be the corresponding codeword fn{X") created by C„. 
Denote by 2n{t) the set of all subsets of {1, . . . , n} of 
size t, i.e. J e I„(t) <^ J C {1, . . . , n} and \ J\^ t. 
In addition, we denote by X^-^^ the set of symbols of 
X" indexed by the elements in the set J' C {1, ... , n}. 

As discussed in section II-BI we assume a passive 
but computationally unbounded adversary that only has 
access to the list-source encoded message /„(X") = 
ynj?„ Based on the observation of y"^", the adversary 
will attempt to determine what is the original message. 
In addition, we assume that the source statistics and 
the list-source code used are universally known, i.e. an 
adversary has access to the distribution px'^{X") of 
the symbol sequences produced by the source and the 
sequence of codes C„. We use the standard information- 
theoretic approach of measuring the amount of informa- 
tion that an adversary can gain of a specific sequence of 
source symbols X^'^^ by observing y"^" as the mutual 
information /(X^-^); F"-""). 

B. Symbol Secrecy 

The following definition introduces two security met- 
rics, namely absolute symbol secrecy and e-symbol se- 
crecy. 

Definition 4. We define /^io(C„) as the absolute symbol 
secrecy of a code C„ as 

Mo(C„) = max |^ : /(X^-^); y"«") = 0, VJ G I„(i)J 

(7) 

The absolute symbol secrecy i^iq of a sequence of codes 
Cn is: 

^0 = liminf ^o(C„). (8) 
Furthermore, we define the e-symbol secrecy /ig of a code 

/ie(C„) = max|^ : i/(X('^); r"«") < e VJ CE X„(i)' 

and the e-symbol secrecy of a sequence of codes C„ as 
/if = liminf ^^(Cn), (10) 

n— >-oo 

where e < H{X). 
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Proposition 3. Let C„ be a sequence of list-source codes 
that achieves a rate-list pair {R, L) and an e-sytnbol 
secrecy of /if. Then < /i^ < min | ' 

Proof: We denote fie{Cn) = A'e.ji- Note that 
Therefore 

n 

< L,Jog\X\. 

The result follows by taking n ^ oo. ■ 
The previous result bounds the amount of information 
an adversary gains about particular source symbols by 
observing a list-source encoded message. In particular, 
for e = 0, we find a meaningful bound on what is the 
largest fraction of input symbols that is perfectly hidden. 
A simple upper-bound for the maximum average amount 
of information that an adversary can gain from a message 
encoded with any source code C„ with symbol secrecy 
We „ is given below. 

Proposition 4. For any code C„ for a discrete memory- 
less source X and any e such that < e < H{X), we 
have 

< H{X) - ^i,,n{H{X) - e), (11) 

where He,n = A^e(C„). 

Proof: Let ^^^n — t/n, J <E I„(t) and J — 
{!,..., n]\J. Then 

n n \ t J 

(12) 

</^,,„e+-^^^^i/(X) (13) 
n 

= H{X)-^ji,,,{H{X)-e). (14) 

■ 

The next proposition relates the rate-list function with 
e-symbol secrecy through the upper bound in proposition 

El 

Proposition 5. If a sequence of list-source codes Cn 
achieves a point {R' , L) with /ij = for some e, 

where R' = lim„^oo then R' = R{L). 

Proof: Assume that C„ satisfies the conditions in the 
proposition and (5 > is given. Then for n sufficiently 



large, we have from (fTTT i: 

n n 

<H{X)- fi,{H{X)-e) + 6 

= H{X) - L\og\X\+S. 

Since this holds for any S, then R' < H{X) - Llog\X\. 
However, from proposition 1, R' > H{X) — L\og\X\, 
and the result follows. ■ 

C. A scheme based on MDS codes 

We now prove that for a uniform i.i.d. source X in 
Fg, using scheme [T] with an MDS parity check matrix 
H achieves /io- Since the source is uniform and i.i.d., 
no source coding is used. 

Proposition 6. //' H is the parity check matrix of an 
{n, k,d) MDS and the source X" is uniform and i.i.d., 
then Scheme\l\achieves the upper bound fiQ ~ L, where 
L = k/n. 

Proof: Let H be the parity check matrix of a 
(n, k,n-k + l) MDS code C over F,, and let x e C. Fix 
a set J7 G Iri(fc) of k positions of x, denoted x'"^'. Since 
the minimum distance of C is n — fc + 1, for any other 
codeword in z G C we have z'-^' ^ x^-^K Denoting by 
C(J) = {^{J) g : x e C}, then |C(-^)|= \C\= q\ 
Therefore, C^'^^ contains all possible combinations of k 
symbols. Since this property also holds for any coset of 
H, the result follows. 

■ 

V. List-source codes for general source 

MODELS 

The previous results hold for i.i.d. source models. 
However, for more general sources the analysis becomes 
significantly more convoluted, since multiple list-source 
encoded messages can reveal information about each 
other. Considering that encryption is performed over 
multiple blocks of source symbols, the list size will not 
necessarily grow if these block are correlated. 

In general, given an output X ~ Xi, . . . , X„ of n 
correlated source symbols, and using scheme 1, what 
is observed by an eavesdropper is the coset valued 
sequence of random elements {i7(s„(X))}, H being the 
parity check matrix. Since X is a correlated source of 
symbols, there is no a priori reason to expect that the 
coset valued process will not be correlated. For example 
if X forms a Markov chain, then the coset valued process 
is a function of a Markov chain; although it will not, 
in general, form a Markov chain itself, it will still 
have correlations. These correlations could effectively 
reduce the list size that an eavesdropper must search and. 
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consequently, reduce the effectiveness of the scheme. 
Reducing or eliminating correlations in the coset valued 
process would counteract the impact of this vulnerability. 

Different approaches can be taken to resolve this 
issue. In general, the key to reducing the effect of 
the correlation between codewords is to encode larger 
block lengths. More precisely, let Xi, X2, . . . , Xjv 
be N blocks of symbols produced by a Markov 
source, such that X^ 6 and p(Xi, . . . , X^v) = 
p(Xi)p(X2|Xi) . . .p(X7v|Xjv_i). Instead of encoding 
each block individually, the transmitter can compute 
Y"^^ = /(Xi,...,Xjv). 

The previous approach has the disadvantage of requir- 
ing long block lengths and possibly high implementation 
complexity. We note, however, that the encoding pro- 
cedure over multiple blocks does not necessarily have 
to be performed independently. For example, one pos- 
sible approach for overcoming edge-effect correlations 
between codewords is to define Yi = /(Xi, X2), Y2 = 
/(X2, X3), . . . , and so forth. This approach reduces 
the edge effects of correlation between codewords, in 
particular when the individual sequences X^ are already 
significantly long. 

We note that, when probabilistic encryption fS] is 
required over multiple blocks, the source encoded sym- 
bols in scheme 1 can be combined with the output 
of a pseudorandom number generator (PRG) before 
being multiplied by the parity check matrix. This would 
provide the necessary randomization of the output. The 
initial seed of the PRG can then be transmitted to the 
legitimate receiver in phase II of scheme |2] 

VI. Applications and Practical 
Considerations 

The protocol outline presented in scheme |2] is useful 
in different practical scenarios, which are discussed in 
the following sections. Most of the advantages of the 
suggested scheme stem from the fact that list-source 
codes are key-independent, allowing content to be dis- 
tributed when a key distribution infrastructure is not yet 
established, and providing an additional level of security 
if keys are compromised before phase II in scheme |2] 

A. Content pre-caching 

As hinted earlier, list-source codes provide a secure 
mechanism for content pre-caching when a key infras- 
tructure has not yet been established. A large fraction 
of the data can be list-source coded and securely trans- 
mitted before the termination of the key distribution 
protocol. This is particularly significant in large networks 
with hundreds of mobile nodes, where key management 
protocols can require a significant amount of time to 



complete ll23l . Scheme |2] circumvents the communi- 
cation delays incurred by key compromise detection, 
revocation and redistribution by allowing data to be 
efficiently distributed concurrently with the key distri- 
bution protocol, while maintaining a level of security 
determined by the underlying list-source code. 

B. Application to key distribution protocols 

List-source codes can also provide additional robust- 
ness to key compromise. If the secret key is compro- 
mised before phase II of scheme |2] the data will still 
be as secure as the underlying list-source code. Even 
if a (computationally unbounded) adversary has perfect 
knowledge of the key, until the last part of the data is 
transmitted the best he can do is reduce the number 
of possible inputs to an exponentially large list. In 
contrast, if a stream cipher based on a pseudo-random 
number generator were used and the initial seed was 
leaked to an adversary, all the data transmitted up to 
the point where the compromise was detected would 
be vulnerable. The use of list-source codes provide an 
additional, information-theoretic level of security to the 
data up to the point where the last fraction of the message 
is transmitted. This also allows decisions as to which 
receivers will be allowed to decrypt the data can be 
delayed until the very end of the transmission, providing 
more time for detection of unauthorized receivers and 
allowing a larger flexibility in key distribution. 

In addition, if the level of security provided by the list- 
source code is considered sufficient and the key is com- 
promised before phase II, the key can be redistributed 
without the need of retransmitting the entire data. As 
soon as the keys are reestablished, the transmitter simply 
encrypts the remaining part of the data in phase II with 
the new key. 

C. Additional layer of securi ty 

We also highlight that list-source codes can be used 
to provide an additional layer of security to the underly- 
ing encryption scheme. The message can be list-source 
coded after encryption and transmitted in two phases, 
as in scheme |2] As argued in the previous point, this 
provides additional robustness against key compromise, 
in particular when a compromised key can reveal a large 
amount of information about an incomplete message 
(e.g. stream ciphers). Consequently, list-source codes are 
a simple, practical way of augmenting the security of 
current encryption schemes. 

One example application is to combine list-source 
codes with stream ciphers, as noted in section V. The 
source-coded message can be initially encrypted using 
a pseudorandom number generator initialized with a 



7 



randomly selected seed, and then list-source coded. The 
initial random seed would be part of the encrypted 
message sent in the final transmission phase. This setup 
has the advantage of augmenting the security of the 
underlying stream cipher, and provides randomization to 
the list-source coded message. In particular, if the LSC is 
based on MDS codes and assuming that the distribution 
of the plaintext is nearly uniform, strong information- 
theoretic symbol secrecy guarantees can be made about 
the transmitted data, as discussed in section |IV] Even if 
the underlying PRG is compromised, the message would 
still be secure. 

D. Adjustable level of secrecy 

List-source codes provide a tunable level of secrecy, 
i.e. the amount of security provided by the scheme can 
be adjusted according to the application of interest. This 
can be done by appropriately selecting the size of the 
list (L) of the underlying code, which determines the 
amount of uncertainty an adversary will have regarding 
the input message. In the proposed implementation using 
linear codes, this corresponds to choosing the size of the 
parity check matrix H, or, analogously, the parameters of 
the underlying error-correcting code. In terms of scheme 
121 a larger (respectively smaller) value of L will lead to 
a smaller (larger) list-source coded message in phase I 
and a larger (smaller) encryption burden in phase II. 

VII. Conclusions 

In this paper we introduced the concept of list-source 
codes, which are codes that compress a source below 
its entropy rate. We derived fundamental bounds for the 
rate list region, and provided code constructions that 
achieve these bounds. List-source codes are a useful 
tool to understand how to perform encryption when the 
(random) key length is smaller that the message entropy. 
In a nutshell, when the key is small, we can reduce an 
adversary's uncertainty to a near-uniformly distributed 
list of possible source sequences with an exponential (in 
terms of the key length) number of elements by using 
Ust-source codes. We also demonstrated how list-source 
codes can be implemented using standard linear codes. 

Furthermore, a new information-theoretic metric of 
secrecy was presented, namely e-symbol secrecy, which 
characterizes the amount of information leaked about 
specific symbols of the source given an encoded version 
of the message. We derived fundamental bounds for e- 
symbol secrecy, and showed how these bounds can be 
achieved using MDS codes when the source is uniformly 
distributed. Finally, we discussed how list-source codes 
can be applied to practical encryption schemes. 
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